-
-
Notifications
You must be signed in to change notification settings - Fork 7.2k
[Misc] Add gemma3 chat template with pythonic-style function calling #17149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
@philipchung please, sign the DCO, e.g., |
…-call-parser=pythonic) Signed-off-by: Philip Chung <[email protected]>
@paolovic I've signed the DCO now. |
thanks for adding it. but this caused hanging when tool parser failed for some results. especially 4b |
Hi, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👍
I've tested the function calling flow with the changes in this PR, and it appears to be working correctly. The model successfully identified the need for a function call based on the user prompt and the provided tool definition, extracted the necessary arguments, and then generated an appropriate final response after receiving the function's result.
Here's a summary of the test case and results:
<bos><start_of_turn>user
Tools (functions) are available. If you decide to invoke one or more of the tools, you must respond with a python list of the function calls.
Example Format: [func_name1(params_name1=params_value1, params_name2=params_value2...), func_name2(params)]
Do not use variables. DO NOT USE MARKDOWN SYNTAX. You SHOULD NOT include any other text in the response if you call a function. If none of the functions can be used, point it out. If you lack the parameters required by the function, also point it out.
Here is a list of functions in JSON format that you can invoke.
[
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a specified location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g., San Francisco, CA"
},
"unit": {
"type": "string",
"enum": [
"celsius",
"fahrenheit"
],
"description": "The unit of temperature"
}
},
"required": [
"location"
]
}
}
}
]
How's the weather in Seoul?<end_of_turn>
<start_of_turn>model
<bos><start_of_turn>user
How's the weather in Seoul?<end_of_turn>
<start_of_turn>model
[get_current_weather(location="Seoul"unit="celsius")]<end_of_turn>
<start_of_turn>user
<tool_response>
{"location": "Seoul", "temperature": 22, "unit": "celsius", "forecast": ["sunny", "windy"], "humidity": 60}</tool_response><end_of_turn>
<start_of_turn>model
2025-05-02 18:15:47.532 | INFO | __main__:test_function_call:87 - Messages: [
{
"role": "user",
"content": "How's the weather in Seoul?"
}
]
2025-05-02 18:15:47.532 | INFO | __main__:test_function_call:88 - Tools: [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a specified location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g., San Francisco, CA"
},
"unit": {
"type": "string",
"enum": [
"celsius",
"fahrenheit"
],
"description": "The unit of temperature"
}
},
"required": [
"location"
]
}
}
}
]
2025-05-02 18:15:47.961 | WARNING | __main__:test_function_call:96 - Tool call detected!
2025-05-02 18:15:47.961 | DEBUG | __main__:test_function_call:101 - Function: get_current_weather
2025-05-02 18:15:47.961 | DEBUG | __main__:test_function_call:102 - Arguments: {'location': 'Seoul', 'unit': 'celsius'}
2025-05-02 18:15:47.961 | DEBUG | __main__:test_function_call:105 - Function result: {'location': 'Seoul', 'temperature': 22, 'unit': 'celsius', 'forecast': ['sunny', 'windy'], 'humidity': 60}
2025-05-02 18:15:47.962 | INFO | __main__:test_function_call:117 - Continuing conversation with function result...
2025-05-02 18:15:48.474 | INFO | __main__:test_function_call:119 -
Final AI response: The weather in Seoul is currently 22°C. It's sunny and windy with 60% humidity.
Hi, thank you for your contribution! However, I’ve observed on several occasions that the LLM starts with a text reply and only calls a tool at the very end—even though the prompt explicitly and clearly tells it to invoke one of the tools. Example :
It seems that the instruction “You SHOULD NOT include any other text in the response if you call a function” is not being followed by Gemma. Do you know whether vLLM can be configured to let the model return both text and a tool invocation in the same response? |
This PR adds a Jinja2 chat prompt template for Gemma-3 for generating tool calls in pythonic format and is compatible with the existing vLLM pythonic tool call parser that extracts the tool calls and formats them into the
tool_calls
field forChatCompletion
responses as aChatCompletionMessageToolCall
.The template is a combination of contributions from @jstangroome and @philipchung.
FIX #14734